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Abstract. 

This contribution is concerned with mathematical models for the dynamics of the ge- 
netic composition of populations evolving under recombination. Recombination is the ge- 
netic mechanism by which two parent individuals create the mixed type of their offspring 
during sexual reproduction. The corresponding models are large, nonlinear dynamical 
systems (for the deterministic treatment that applies in the infinite-population limit), 
or interacting particle systems (for the stochastic treatment required for finite popula- 
tions). We review recent progress on these difficult problems. In particular, we present 
a closed solution of the deterministic continuous-time system, for the important special 
case of single crossovers; we extract an underlying linearity; we analyse how this carries 
over to the corresponding stochastic setting; and we provide a solution of the analogous 
deterministic discrete-time dynamics, in terms of its generalised eigenvalues and a simple 
recursion for the corresponding coefficients. 
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1. Introduction 

Biological evolution is a complex phenomenon driven by various processes, such as 
muation and recombination of genetic material, reproduction of individuals, and 
selection of favourable types. The area of population genetics is concerned with 
how these processes shape and change the genetic structure of populations. Math- 
ematical population genetics was founded in the 1920's by Ronald Fisher, Sewall 
Wright, and John Haldane, and thus is among the oldest areas of mathematical 
biology. The reason for its continuing (and actually increasing) attractiveness for 
both mathematicians and biologists is at least twofold: Firstly, there is a true need 
for mathematical models and methods, since the outcome of evolution is impossi- 
ble to predict (and, thus, today's genetic data are impossible to analyse) without 
their help. Second, the processes of genetics lend themselves most naturally to a 
mathematical formulation and give rise to a wealth of fascinating new problems, 
concepts, and methods. 
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This contribution will focus on the phenomenon of recombination, in which two 
parent individuals are involved in creating the mixed type of their offspring during 
sexual reproduction. The essence of this process is illustrated in Fig. Q] and may 
be idealised and summarised as follows. 




Figure 1. Life cycle of a population under sexual reproduction and recombination. Each 
line symbolises a sequence of sites that defines a gamete (like the two at the top that 
start the cycle as 'egg' and 'sperm'). The pool of gametes at the left and the right comes 
from a large population of recombining individuals. These sequences meet randomly to 
start the next round of the cycle. 

Genetic information is encoded in terms of sequences of finite length. Eggs and 
sperm (i.e., female and male germ cells or gametes) each carry a single copy of 
such a sequence. They go through the following life cycle: At fertilisation, two 
gametes meet randomly and unite, thus starting the life of a new individual, which 
is equipped with both the maternal and the paternal sequence. At maturity, this 
individual will generate its own germ cells. This process includes recombination, 
that is, the maternal and paternal sequences perform one or more crossovers and 
are cut and relinked accordingly, so that two 'mixed' sequences emerge. These 
are the new gametes and start the next round of fertilisation (by random mating 
within a large population). 

Models of this process aim at describing the dynamics of the genetic compo- 
sition of a population that goes through this life cycle repeatedly. These models 
come in various flavours: in discrete or continous time; with various assumptions 
about the crossover pattern; and, most importantly, in a deterministic or a stochas- 
tic formulation, depending on whether the population is assumed to be so large 
that stochastic fluctuations may be neglected. In any case, however, the resulting 
process appears difficult to treat, due to the large number of possible states and the 
nonlinearity generated by the random mixture of gametes. Nevertheless, a num- 
ber of solution procedures have been discovered for the deterministic discrete-time 
setting [51 [TUJ Q3] , and the underlying mathematical structures were investigated 
within the framework of genetic algebras, see [T8j [19j [22] . Quite generally, the solu- 
tion relies on a certain nonlinear transformation (known as Haldane linearisation) 
from (gamete or type) frequencies to suitable correlation functions, which decouple 
from each other and decay geometrically. But if sequences of more than three sites 
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are involved, this transformation must be constructed via recursions that involve 
the parameters of the recombination process, and is not available explicitly except 
in the trivial case of independent sites. For a review of the area, see Ch. V.4]. 

In this contribution, we concentrate on a special case that is both biologically 
and mathematically relevant, namely, the situation in which at most one crossover 
happens at any given time. That is, only recombination events may occur that 
partition the sites of a sequence into two parts that correspond to the sites before 
and after a given crossover point. We analyse the resulting models in continuous 
time (both deterministic and stochastic), as well as in discrete time. For the de- 
terministic continuous-time system (Section [2]), a simple explicit solution can be 
given. This simplicity is due to some underlying linearity; actually, the system may 
even be diagonalised (via a nonlinear transformation). In Section [3j we consider 
the corresponding stochastic process (still in continuous time), namely, the Moran 
model with recombination. This also takes into account the resampling effect that 
comes about via random reproduction in a finite population. In particular, we 
investigate the relationship between the expectation of the Moran model and the 
solution of the deterministic continuous-time model. We finally tackle determin- 
istic single-crossover dynamics in discrete time (Section [4]) . This setting implies 
additional dependencies, which become particularly transparent when the so-called 
ancestral recombination process is considered. A solution may still be given, but 
its coefficients must be determined recursively. 

Altogether, it will turn out that the corresponding models, and their analysis, 
have various mathematical facettes that are intertwined with each other, such as 
differential equations, probability theory, and combinatorics. 



2. Deterministic dynamics, continuous time 

2.1. The model. We describe populations at the level of their gametes and 
thus identify gametes with individuals. Their genetic information is encoded in 
terms of a linear arrangement of sites, indexed by the set S := {0, 1, . . . , n}. For 
each site i G 5, there is a set Xj of 'letters' that may possibly occur at that site. To 
allow for a convenient notation, we restrict ourselves to the simple but important 
case of finite sets Xf, for the full generality of arbitrary locally compact spaces Xi, 
the reader is referred to [3] and [5]. 
A type is thus defined as a sequence 

x = (x ,x 1 , ...,£„) e X X Xi X ••• X X n =: X, 

where X is called the type space. By construction, x i is the i-th coordinate of 
x, and we define x 1 := (x^i^i as the collection of coordinates with indices in /, 
where I is a subset of S. A population is identified with a non-negative measure 
u> on X. Namely, uj({x}) denotes the frequency of individuals of type x € X and 
uj(A) := J2xeA ^({x}) f° r A C X; we abbreviate oj({x}) as oj(x). The set of all 
nonnegative measures on X is denoted by M^>q(X). If we define S x as the point 
measure on x (i.e., S x (y) = S XtV for x, y € X), we can also write uj — J2 x ex w{x)8 x . 
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We may, alternatively, interpret S x as the basis vector of R> ' that corresponds to 
x (where a suitable ordering of types is implied, and \X\ is the number of elements 
in X)\ to is thus identified with a vector in M> • 

At this stage, frequencies need not be normalised; u)(x) may simply be thought 
of as the size of the subpopulation of type x, measured in units so large that it 
may be considered a continuous quantity. The corresponding normalised version 
p := o;/||lli|| (where ||w|| := ^2 xeX ^(x) = w(X) is the total population size) is then 
a probability distribution on X , and may be identified with a probability vector. 

Recombination acts on the links between the sites; the links are collected into 
the set L := {5, §,..., 2 " 2 ~ 1 } . We shall use Latin indices for the sites and Greek 
indices for the links, and the implicit rule will always be that a = =^= is the link 
between sites i and i + 1; see Figure 

i G 5" 



n 



a G L 



Figure 2. Sites and links. 

Let recombination happen in every individual, and at every link a G L, at 
rate g a > 0. More precisely, for every a G L, every individual exchanges, at rate 
£> Q /2, the sites after link a with those of a randomly chosen partner. Explicitly, 
if the 'active' and the partner individual are of types x and y, then the new pair 
has types (x ,x 1 , . . . ,x [a] ,y laV . . . 7 y n ) and (y , y lt . . . , y [a] , i H , . . . , x n ), where 
[_aj ( \a] ) is the largest integer below a (the smallest above a) ; see Fig. [3] Since 
every individual can occur as either the 'active' individual or as its randomly chosen 
partner, we have a total rate of g a for crossovers at link a. For later use, we also 

define Q~Y, a &L Ba- 
in order to formulate the corresponding model, let us introduce the projection 
operators 7T i , i G S, via 

7Tj : X x Xt x • • • x X n — > X, ^ 

(x Q , X l7 . . . , X n ) I— > 2^, 

i.e., 7T^ is the canonical projection to the z-th coordinate. Likewise, for any index 
set ICS, one defines a projector 

TTj : X — > X ieI Xi =: Xi 
(x ,x 1 , . . . ,x n ) h-> foOiei =: 

We shall frequently use the abbreviations 7r <Q := tt^ and 7r >Q := n i , 

as well as x <a := tt <01 {x), x >a := ir >a (x). The projectors ir <a and ir >a may be 
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Figure 3. Upper panel: Recombination between individuals of type a; and y. Lower panel: 
The corresponding 'marginalised' version that summarises all events by which individuals 
of type x are gained or lost (a V at site i stands for an arbitrary element of Xi). Note 
that, in either case, the process can go both ways, as indicated by the arrows. 



thought of as cut and forget operators because they take the leading or trailing 
segment of a sequence x, and forget about the rest. 

Whereas the 7r 7 act on the types, we also need the induced mapping at the level 
of the population, namely, 

tt 7 . : M^q — > M^q < 2 - 

LU I — y UJ O TTj 1 =: TTj.UJ, 

where it J 1 denotes the preimage under 7r 7 . The operation . (where the dot is on the 
line) is the 'pullback' of ~k 1 w.r.t. w\ so, ■k i .lj is the marginal distribution of u) with 
respect to the sites in /. In particular, (77 <0 ,.oj)(x <a ) is the marginal frequency of 
sequences prescribed at the sites before a, and vice versa for the sites after a. 

Now, single-crossover recombination (at the level of the population) means the 
relinking of a randomly chosen leading segment with a randomly chosen trailing 
segment. We therefore introduce (elementary) recombination operators (or recom- 
binators, for short), R a : M^>o for a G £, defined by 

R<*(u) ■= |j-jT ((7r< Q -w) <8 (7r >Q .w)). (3) 

IMI 

Here, the tensor product reflects the independent combination (i.e., the product 
measure) of the two marginals Tr <a .u> and Tr >a .u>. R a is therefore a cut and relink 
operator. R a {ui) may be understood as the population that emerges if all individ- 
uals of the population to disintegrate into their leading and trailing segments and 
these are relinked randomly. Note that ||i? Q (w)|| = ||o;||. 

The recombination dynamics may thus be compactly written as 

d>t=Yl e a (R a {u t ) -u t ) = J2 Q a {Rc - l)(wt) =: #(w t ), (4) 

aEL aeL 

where 1 is the identity operator. Note that ((4)) is a large, nonlinear system of 
ordinary differential equations (ODEs) . 
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2.2. Solution of the ODE system. The solution of (0} relies on some el- 
ementary properties of our recombinators. Most importantly, they are idempotcnts 
and commute with each other, i.e., 

Rl = R a , aeL, (5) 
R a Rfs = RpR a , a,(3eL. (6) 

These properties are intuitively plausible: if the links before a are already in- 
dependent of those after a due to a previous recombination event, then further 
recombination at that link docs not change the situation; and if a product mea- 
sure is formed with respect to two links a and /3, the result does not depend on 
the order in which the links are affected. For the proof, we refer to Prop. 2]; 
let us only mention here that it relies on the elementary fact that, for oj G Ai^o, 

7r <a-(^/3( w )) = 7r <Q' i;J i f° r P > a ) an d 

^a-WM) = 7T>a- W ! for P — a i 

that is, recombination at or after a does not affect the marginal frequencies at 
sites before a, and vice versa. 

We now define composite recombinators as 

Rg ■= J| R a for GQL. 

a£G 

Here, the product is to be read as composition; it is, indeed, a product if the 
recombinators are written in terms of their multilinear matrix representations, 
which is available in the case of finite types considered here (see [2]). By property 
(J6]), the order in the composition plays no role. Furthermore, ((5]) and (J6j) obviously 
entail RgRh — Rguh for G,H C L. 

With this in hand, we can now state an explicit solution of our problem, namely, 

Theorem 1. The solution of the single- crossover dynamics ((4]) with initial value 
luq can be given in closed form as 

oj t = ^2 a G (t)R G (u ) =: (p t (oJo) (7) 

GCL 

with coefficient functions 

a£L\G PGG 

i.e., (ft is the semigroup belonging to the recombination equation Q. □ 

For the proof, the reader is referred to [3 Thm. 2] or [3l Thm. 3] (the former 
article contains the original, the latter a shorter and more elegant version of the 
proof). Let us note that the coefficient functions can be interpreted probabilis- 
tically. Given an individual sequence in the population, a G (t) is the probability 
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that the set of links that have seen at least one crossover event until time t is 
precisely the set G (obviously, J2gcl °g W = !)■ Note that the product structure 
of the a G (t) implies independence of links, a decisive feature of the single-crossover 
dynamics in continuous time, as we shall see later on. Note also that, as t — > oo, 
ojt converges to the stationary state 

1 " 

WOO = n 0(^.^0), (8) 

in which all sites are independent. 

2.3. Underlying linearity. The simplicity of the solution in Theorem Q] 
comes as a certain surprise. After all, explicit solutions to large, nonlinear ODE 
systems are rare - they are usually available for linear systems at best. For this 
reason, the recombination equation and its solution have already been taken up 
in the framework of functional analysis, where they have led to an extension of 
potential theory [21] . We will now show that there is an underlying linear structure 
that is hidden behind the solution. It can be stated as follows, compare [5] Sec. 3.2] 
for details. 

Theorem 2. Let {c%,'\t) | C G' C V C L] be a family of non-negative 

functions with c G (t) — c G (t) c^\t), valid for any partition L = L1UL2 of the 
set L and all t > 0, where G i :— G Pi L i . Assume further that these functions 

satisfy X^ffCL' c fi '(^) = 1 f or an V L' <Z L and t > 0. IfvG Ai^o(X) and H C L, 
one has the identity 

GCL GCL 

which is then satisfied for all t > 0. □ 

Here, the upper index specifies the respective set of links. Clearly, the coeffi- 
cient functions a G (t) of Theorem [T] satisfy the conditions of Theorem [5J The result 
then means that the recombinators act linearly along the solutions ([7} of the re- 
combination equation ((4]). Theorem [2] thus has the consequence that, on A4^o(X), 
the forward flow of (j4]) commutes with all recombinators, that is, R G oip t = tp t oR G 
for all t > and all GCL. 

But let us go one step further here. The conventional approach to solve the 
recombination dynamics consists in transforming the type frequencies to certain 
functions (known as principal components) that diagonalise the dynamics, see 
[TUl [18] and references therein for more. We will now show that, in contin- 
uous time, they have a particularly simple structure: they are given by certain 
correlation functions, known as linkage disequilibria (LDE) in biology, which play 
an important role in applications. They have a counterpart at the level of operators 
(on M.^q(X)). Namely, let us define LDE operators via 

7b := J2 (-^ G] Rh, GCL, (9) 

HDG 
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where the underdot indicates the summation variable. Note that Tq maps M.^o{X) 
into M(X), the set of signed measures on X. Eq. ([9]) leads to the inverse Rq — 
J2hdg^ h ^ the combinatorial Mobius inversion formula, see [TJ Thm. 4.18]. We 
then have 

Theorem 3. If Wt is the solution ([7]), the transformed quantities TQ{uj t ) satisfy 
! T <z(t*) = -( E Sc)T G {Lo t ), GCL. (10) 

a£L\G 

Proof. See Sec. 3.3]. □ 

Obviously, Eq. (1101) is a system of decoupled, linear, homogeneous differential 
equations with the usual exponential solution. Note that this simple form emerged 
through the nonlinear transform ^ as applied to the solution of the coupled, 
nonlinear differential equation 

Suitable components of the signed measure Tc(oJt) may then be identified to 
work with in practice (see [51[S] for details); they correspond to correlation functions 
of all orders and decouple and decay exponentially. These functions turn out to be 
particularly well-adapted to the problem since they rely on ordered partitions, in 
contrast to conventional LDE's used elsewhere in population genetics, which rely 
on general partitions (see O Ch. V.4] for review). 



3. Stochastic dynamics, continuous time 

3.1. The model. The effect of finite population size in population genetics is, 
in continuous time, well captured by the Moran model. It describes a population 
of fixed size N and takes into account the stochastic fluctuations due to random 
reproduction, which manifest themselves via a resampling effect (known as genetic 
drift in biology). More precisely, the finite-population counterpart of our determin- 
istic model is the Moran model with single-crossover recombination. To simplify 
matters (and in order to clearly dissect the individual effects of recombination and 
resampling), we shall use the decoupled (or parallel) version of the model, which 
assumes that resampling and recombination occur independently of each other, as 
illustrated in Fig. |4j More precisely, in our finite population of fixed size N, every 
individual experiences, independently of the others, 

• resampling at rate b/2. The individual reproduces, the offspring inherits the 
parent's type and replaces a randomly chosen individual (possibly its own 
parent). 

• recombination at (overall) rate g a at link a 6 L. Every individual picks a 
random partner (maybe itself) at rate £> a /2, and the pair exchanges the sites 
after link a. That is, if the recombining individuals have types x and y, they 
are replaced by the two offspring individuals (x <a ,y >a ) and (y <a ,x >a ), 
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as in the deterministic case, and Fig. [3] As before, the per-capita rate of 
recombination at link a is then g a , because both orderings of the individuals 
lead to the same type count in the population. 



x y x y 




xx y / X 

(*<a>y>J (y<a< x >a) 

Figure 4. Graphical representation of the Moran model (with parallel resampling and 
recombination). Every individual is represented by a vertical line; time runs down the 
page. Resampling is indicated by arrows, with the parent individual at the tail and the 
offspring at the tip. Recombination is depicted by a crossing between two individuals. 
Note that the spatial information suggested by the graphical representation does not play 
a role in the model; one is only interested in the frequencies of the various types. 

Note that the randomly chosen second individual (for resampling or recombi- 
nation) may be the active individual itself; then, effectively, nothing happens. One 
might, for biological reasons, prefer to exclude these events by sampling from the 
remaining population only; but this means nothing but a change of time scale of 
order 1/N. 

To formalise this verbal description of the process, let the state of the population 
at time t be given by the collection (the random vector) 

Z t = (Z t (x)) xex G £ := {z e {0,l,...,iV} |X| | ^z(x) = 7v}, 

X 

where Z t (x) is the number of individuals of type x at time t; clearly, X^ex %t{x) — 
N. We also use Z t in the sense of a (random counting) measure, in analogy with uj t 
(but keep in mind that Z t is integer-valued and counts single individuals, whereas 
ujt denotes continuous frequencies in an infinite population). The letter z will be 
used to denote realisations of Z t — but note that the symbols x, y, and z are not on 
equal footing (x and y will continue to be types). The stochastic process {Z t }t>o 
is the continuous-time Markov chain on E defined as follows. If the current state 
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is Z t = Z, two types of transitions may occur: 

resampling: z — > z + s(x, y), s(x, y) := S x — 8 y , 

at rate -—bz(x)z(y) for (x,y) £ X x X (11) 

recombination: z — > z + r(x, y, a), 

r(x, y, a) := 8 {x<aiV> j + S( y<a:X>a ) -S x -S y , 

at rate -^Q a z(x)z(y) for (x,y) E X x X,a e L (12) 

(where 5 X is the point measure on x, as before). Note that, in (fTTj) and (IT21 . tran- 
sitions that leave E are automatically excluded by the fact that the corresponding 
rates vanish. On the other hand, 'empty transitions' (s(x, y) = or r{x, y, a) = 0) 
are explicitly included (they occur if x = y in resampling or recombination, and if 
x < a = V<a or x >a = V> a m recombination) . 

3.2. Connecting stochastic and deterministic models. Let us 

now explore the connection between the stochastic process {Z t }t>o on E, its nor- 
malised version {Z t }t>o = {Z t }t>o/N on E/N, and the solution w t = <pt(uo) 
(Eq. (O) of the differential equation. It is easy to see (and no surprise) that 

|E(Z t )=E(#(Z t )), (13) 

with <& of (|4|). But this does not, per se, lead to a 'closed' differential equation 
for E(Z t ), because it is not clear whether E(^(Z t )) can be written as a function of 
E(Zt) alone — after all, <P is nonlinear. In the absence of resampling, however, we 
have 

Theorem 4. Let {Z t } t > Q be the recombination process without resampling (i.e., 
b = 0), and let Zq be fixed. Then, E(Z t ) satisfies the differential equation 

jE(Z t )=4>(E(Z t )) 

with initial value Zq, and <S> from ((4]); therefore, 

E(Z t ) = ip t (Z ), forallt>0, 

with (ft from (J7]). Likewise, for all t > 0, 

E(T G Z t )=T G (<p t {Z )). 

Proof. See {6, Thm. 1 and Cor. 1]. □ 

The result again points to some underlying linearity, which, in the context of 
the stochastic model, should be connected to some kind of independence. Indeed, 
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the key to the proof of Thm. 2] is a lemma concerning the independence of marginal 
processes. For / C S, we introduce the 'stretch' of I as 

J (I) :={i£S\ min(I) < % < max(I)}, 

and look at the projection of the recombination process on non-overlapping stretches. 
This is the content of 

Lemma 5. Let {Z t }t>o be the recombination process without resampling (i.e., 
b = 0). Let A,B C S with J (A) n J(B) = 0. Then, {TT A .Z t } t >a and {TT B .Z t } t >o 
are conditionally (on Zq) independent Markov chains on Ea and Eb- 

Proof. See [SJ Lemma 1]. □ 

Let us now re- include resampling, at rate 6/2 > 0, and consider the stochastic 
process {z\ N) } t >Q defined by both ^TTJ) and where we add the upper in- 

dex here to indicate the dependence on N. Now, Lemma [5] and Thm. 2] are no 
longer valid. The processes {-K <a .Z\ }t>o and {n >a .zj. N ^}t>o are still individu- 
ally Markov, but their resampling events are coupled (replacement of y <a by x <a 
is always tied to replacement of y >a by x >a ). Hence the marginal processes fail 
to be independent, so that no equivalent of Lemma [5] holds. 

Let us, therefore, change focus and consider the normalised version {Z[ }t>o — 
{Zj Ar -'} t >o/ 'N . In line with general folklore in population genetics, in the limit 

N — » oo, the relative frequencies {z[ N ^} t >o cease to fluctuate and are then given 
by the solution of the corresponding deterministic equation. More precisely, we 
have 

Proposition 6. Consider the family of processes \Z\ ; }t>o = jj{^t }t>o> N 

1.2 where {zj. N ^} t >o is defined by (llip and (|12l) . Assume that the initial 

states are chosen so that limjv_ ! . 00 Zq N ^ — po. Then, for every given t > 0, one has 



lim supli^ -p a | =0 (14) 



A' 



s<t 



with probability 1, where p s := (p s (po) is the solution of the deterministic recombi- 
nation equation (|4]). □ 

The proof is an elementary application of Thm. 11.2.1 of [12]; see Prop. 1 of 
[6] for the explicit workout. 

Note that the convergence in (|14j) applies for any given t, but need not carry 
over to t — y co. Indeed, if resampling is present, the population size required to 
get close to the deterministic solution is expected to grow over all bounds with 
increasing t. This is because, for every finite N, the Moran model with resampling 
and recombination is an absorbing Markov chain, which leads to fixation (i.e., to 
a homogeneous population of uniform type) in finite time with probability one 
(for the special case of just two types without recombination, the expected time 
is known to be of order N if the initial frequencies are both 1/2 [T3j p. 93]). In 
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sharp contrast, the deterministic system never loses any type, and the stationary 
state, the complete product measure ([8]), is, in a sense, even the most variable 
state accessible to the system. For increasing N, finite populations stay close to 
the deterministic limit for an increasing length of time. 



4. Discrete time 

Let us return to the deterministic setting and consider the discrete-time version of 
our single-crossover dynamics that is, 

"t+i = « t + Yl &» (Ra ~ 1) K) =: #(w t ) . (15) 

aEL 

Here, the coefficients g a > 0, a G L, are the probabilities for a crossover at link 
a in every generation (as opposed to the rates g a of the continuous-time setting). 
Consequently, we must have < J2a£L 6a — 1- 

Based on the result for the continuous-time model, the solution is expected to 
be of the form 

u t = #*(w ) - "gWRgM > (16) 

GCL 

with non- negative a G (t), G C L, J2gcl^g(^) = 1j describing the (still unknown) 
coefficient functions arising from the dynamics. This representation of the so- 
lution was first stated by Geiringer |T4]. The coefficient functions will have the 
same probabilistic interpretation as the corresponding a G (t) in the continuous-time 
model, so that a G (t) is the probability that the links that have been involved in 
recombination until time t are exactly those of the set G. 

But there is a crucial difference. Recall that, in continuous time, single cross- 
overs imply independence of links, which is expressed in the product structure 
of the a G (t) (see Thm. [TJ. This independence is lost in discrete time, where a 
crossover event at one link forbids any other cut at other links in the same time 
step. It is therefore not surprising that a closed solution is not available in this 
case. It will, however, turn out that a solution can be stated in terms of the 
(generalised) eigenvalues of the system (which are known explicitly) , together with 
coefficients to be determined via a simple recursion. But it is rewarding to take a 
closer look at the dynamics first. 

Let us introduce the following abbreviations: 

L< a ■= {i & L \ i < a} , L> a := {i E L \ i > a} , 

and, for each GCL, 

G <a := {i c G | i < a} , G >a := {i e G | % > a} . 

Furthermore, we set n :— 1 ~ J2 a eL Qa- The dynamics (TIB"]) is then reflected in the 
following dynamics of the coefficient functions: 
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Theorem 7. For all G C L and t G No, the coefficient functions a G (t) evolve 
according to 

a G (t + l) = r ] a G (t) + Y / Q Q ( E «G <a uff(*)) ( E 2 JCuG >a (*)) > ( 17 ) 

c*6G HCL>„ iCCL<„ 

wrai/i initial condition a G (0) = d G . □ 

A verbal description of this dynamics was already given by Geiringer [14] : a 
formal proof may be found in |24[ Thm. 3] . 

The above iteration is easily understood intuitively: A type x resulting from re- 
combination at link a is composed of two segments x <a and x >a . These segments 
themselves may have been pieced together in previous recombination events al- 
ready, and the iteration explains the possible cuts these segments may carry along. 
The first term in the product stands for the type delivering the leading segment 
(which may bring along arbitrary cuts in the trailing segment), the second for the 
type delivering the trailing one (here any leading segment is allowed). The term 
na G (t) covers the case of no recombination. 

Let us now have a closer look at the structure of the dependence between links 
in discrete time. To this end, note first that the set G — {a 1: . . . , oi\Q\ } f_ L with 

< a \a\ partitions L\G into C G := ^Iq, if , . . . , 1^ j, where 



(18) 



a. x < a 2 < 

if = {a e L : | < a < a x } , l£L, = {a g L : a, G , < a < ^f 1 } 

and if — {a g L : a e < a < a e+1 } for 1 < i < \G\ — 1. 

Cutting all links in G decomposes the original system (of sites and links) into 
subsystems which are independent of each other from then on. In particular, the 
links in Ij become independent of those in Ik, for k ^ j. The probability that none 
of these subsystems experiences any further recombination is 

\G\ 

x g = ni 1 e &0- (19) 

i=o Qj.ejG 

In particular, A = n = 1 — ^2 aeL p a > 0. The A G are, at the same time, the 
generalised eigenvalues that appear when the system is diagonalised and have been 
previously identified by Bennett [8], Lyubich 18] and Dawson [TO] . 

A most instructive way to detail the effect of dependence is the ancestral recom- 
bination process: start from an individual in the present population, let time run 
backwards and consider how this individual's type has been pieced together from 
different fragments in the past. In the four-sites example of Fig. [SJ the probability 
that exactly 1/2 and 3/2 have been cut reads 

fc =o <=o (2Q) 
+ eie3(l-^)VA| V AiAjT 2 ^ - *. 

2 2 2 ^ ' 2 12)2/ 
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Here, the first (second) term corresponds to the possibility that link 1/2 (3/2) is 
the first to be cut. Obviously, the two possibilities are not symmetric: If 3/2 is the 
first to break, an additional factor of (1 — £5/2) is required to guarantee that, at 
the time of the second recombination event (at 1/2), the trailing segment (sites 2 
and 3) remains intact while the leading segment (sites and 1) is cut. 




l-e 
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Figure 5. The ancestral recombination process: possible histories of the sequence 0123 
(at the bottom). The two panels illustrate the two terms of 0,11/2,3/2} (t) in Eq. (|2U[1 (left: 
link 1/2 is cut first; right: link 3/2 is cut first.) Arrows point in the backward direction 
of time. Blank lines indicate arbitrary leading or trailing segments with which parts of 
the sequence have joined during recombination (they correspond to the asterisks (*) in 
Fig- E| • The probability that nothing happens for a while (straight arrows only) is given 
by (powers of) the generalised eigenvalues (|19[) . 



Despite these complications, the discrete-time dynamics can again be solved, 
even directly at the level of the a G (t), albeit slightly less explicitly than in contin- 
uous time. Indeed, it may be shown (and will be detailed in a forthcoming paper) 
that the coefficient functions have the form 



HCG 



where the upper index has again been added to indicate the dependence on the 
system. The coefficents 7^ L) (#) (H C G) arc defined recursively as follows. For 



-y£\B)={ X " -aL"'™" - — ' — ' ' (21) 

I-E^cgT^V), H = 0. 

Together with the initial value 70 i ' ) (0) = 1, this may be solved recursively. 

A diagonalisation of the system (analogous to that in Thm. [3]) may also be 
achieved via a related, albeit technically more involved recursion [24] . 
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5. Concluding remarks and outlook 

The results presented here can naturally only represent a narrow segment from a 
large area with lively recent and current activities. Let us close this contribution 
by mentioning some important further directions in the context of recombination. 

Our restriction to single crossovers provided a starting point with a uniquely 
transparent structure (mainly due to the independence of links in continuous time) . 
However, arbitrary recombination patterns (which partition the set of links into 
two arbitrary parts) can also be dealt with, as has been done for the deterministic 
case in |18[ll0j. The underlying mathematical structure will be further investigated 
in a forthcoming paper, for both the deterministic and the stochastic models. 

Above, genetic material was exchanged reciprocally at recombination events, 
so that the length of each sequence remains constant. But sequences may also shift 
relative to each other before cutting and relinking (so-called unequal crossover), 
which entails changes in length, see [4] and references therein for more. 

The most important aspect of modern population genetics is the backward- 
in-time point of view. This is natural because evolution is mainly a historical 
science and today's researchers try to infer the past from samples of individuals 
taken from present-day populations. We have hinted at this with our version of 
an ancestral recombination process, but would like to emphasise that this is only 
a toy version. The full version of this process also takes into account resampling 
(as in Sec. [3l with b > 0) and aims at the law of genealogies of samples from finite 
populations. This point of view was introduced by Hudson |16j . The fundamental 
concept here is the ancestral recombination graph: a branching-coalescing graph, 
where branching (backwards in time) comes about by recombination (as in Fig. [5]), 
but lines may also coalesce where two individuals go back to a common ancestor 
(this corresponds to a resampling event forward in time). For recent introductions 
into this topic, see [Til Chap. 3], [15l Chap. 5], or [23l Chap. 7]; these texts also 
contain overviews of how recombination may be inferred from genomic datasets. 

Last not least, recombination and resampling are but two of the various pro- 
cesses that act on genes in populations. Further inclusion of mutation and/or 
selection leads to a wealth of challenging problems, whose investigation has stim- 
ulated the exploration of new mathematical structures, concepts, and methods; 
let us only mention [7j, [50], and [T7] as recent examples. This development is 
expected to continue and intensify in the years to come - not least because it 
concerns the processes that have shaped present-day genomes. 
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